-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance: build DocIdSetIterator in ArrayHitCounter to enable future optimizations #718
Conversation
…ery to ArrayHitCounter
@alexklibisz I was going to do a pr, to solve the issue that the old HitCounter had, returning documents with zero hits, I see that you still have the error, I also analyzed a bit you code, lets compute the time if a shard has D documents, H of which are a hit in any of the L hash tables. |
let me add a further comment about L and K
These values represent how the expression changes as ( k ) and ( l ) increase. so a value of k=10 and L =100 will select 1/10 of the documents but with L = 100 and K < 6 selects most of the documents. This impacts you benchmarks here #160 (comment) , as you always use k=4 l and L =100 , so 99% of the documents are a hit! |
…lastiknn into 160-better-counter
Related Issue
#715
Changes
I have some ideas for improvements to the ArrayHitCounter. Most of the ideas are tightly-coupled to the way the DocIdSetIterator is constructed, so I'd like to bring that functionality into the HitCounter. That's what this PR is doing.
Testing and Validation
Added new tests and re-ran benchmarks